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scored items present special problems when an analysis o* 
correlational interrelationships amonq the items is attempted. Two 
general methods of analyzing binary data are proposed by Horst to 
partial out the effects of differences in item difficulties: P) a 

least square simplex data matrix solution, and (2) a least square 
simplex covariance matrix solution. Of these, the first was selected 
for study using (1) a reoression approach, (2) a raw data approach, 
and { q ) the computational algorithm for the raw data matrix approach. 
The results indicate that Horst’s modification clearly induces an 
effect that contaminates the common factor structure of the 
variables. Further, the findings also indicate t at image, aloha, and 
principal components analysis of correlation matrices obtained from 
binary data matrices are all satisfactory methods of analysis without 
the modification, ^his may be an important finding since it tends to 
confirm earlier empirical findings concerning the varying 
difficulties of binary items. (CK) 
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STUDIES OP HORST'S PROCEDURE FOR BINARY DATA ANALYSIS 

William M. Gray 
Richard J. Hofmann 

BACKGROUND 

Most responses to educational and psychological test 5 . 
items may be represented in binary form. However, such di- 
chotomously scored items present special problems when an an- 
alysis of correlational interrelationship?:among the items is 
attempted. For example, when a test is intended to measure a 
unitary trait and also contains items of varying "difficulty," 
item intercorrelations will not, In general, be homogeneous*.. 
The problems of choosing a "Droper" coefficient of interre- 
lationship in view of the "contaminating'' effect of item "dif- 
ficulty" appear as yet not to have been solved (Horst, 1965). 

Carroll (1961) suggests tetrachoric correlations instead 
of produce-moment or other coefficients because tetrachorics 
avoid certain problems’ in varying "difficulty" levels. How- 
ever, tetrachoric correlations assume "latent" bivariate nor- 
mal distributions between pairs of items, and it is possible 
that tetrachoric correlation matrices may not even be Gramian. 
Horst (1965) and Outtman (1950) strongly criticize the analy- 
sis of tetrachoric r's for binary data. 

Items with like "difficulty" indices can, in general, be 
correlated more highly than items with unlike "difficulty" 
indices. In turn, differences in "difficulties" across items 
may be represented as extra factors in a factor analysis of 
the items (Ferguson, 19^1; Outtman, 1950; Horst, 1965$ 



which were used as input for the second program. 

The second program consisted of a series of three differ- 
ent multivariate methods, each of which can be loosely termed 
a type of factor analysis. For each set of data, the analyses 
proceeded both from the residual correlation matrices result- 
ing from Horst's procedure and also from the matrix of phi co- 
efficients computed from the original (permuted) binary data 
matrices. The number of common factors were held constant for 
each set of data by inputing to the program the number of Gufct- 
man tyrrt scales built into that given set of data. Factor an- 
alytic methods used were: (1) image analysis following the al- 

gorithm developed by Harris (1962); (2) alpha factor analysis 
following the algorithm given by Kaiser and Caffrey (1965); and 
(3) principal components following the algorithm presented by 
Hotelling (1933) and Harmon (1967). Transformations applied to 
each of the factor solutions were: (1) normal varimax as dis- 

cussed by Kaiser (1958) and (2) a case II independent cluster 
solution (Harris and Kaiser, 1964) applied to the principal 
axis representation of the major product of the initial factor 
loading matrix for each solution (Pruzek, 1967). Tc summarize, 
then, each of four sets of artificial binary data were analyzed 
with and without Horst's procedure, using each of three fac- 
toring procedures with two analytic transformations for each. 
Tlius, twenty-four possible factor pattern matrices were gener- 
ated for study. 



RESULTS AND DISCUSSION 



IMAGE ANALYSIS 

Binary Data Matrix . In the ensuing discussion, reference 
will be made to scale factors (factors built into each data 
set). Data sets A and B have tvjo scale factors: (1) For data 
set A, scale factor a consists of items 1, 3, 5 , 7, 9 and scale 
factor b consists of items 2, 6, 8, 10. (2) For data set B, 

scale factorra consists of itfems 1 , 3, 7, 8, 10 and scale 

factor b consists of items 2, 5, 6, 9. Data sets C and D have 

three scale factors: (1) For data set C, scale factor a con- 

sists of items 1, 3» 4 , 8; scale factor b consists of items 7, 
9, 11, 12; scale factor c consists of items 2, 5, 8, 10. For 
data set D, scale factor a consists of items 8, 10, 12; scale 

factor b consists of items 2, 3, 8, 7; scale factor c consists 

of 1 , ft, 5, 9, 11. 

Tables ^ , 5, 6, 7 are composed of the analyses for data 
sets A, B, C , D respectively. The lower case letter next to 
the difficulty index refers to the particular scale factor into 
which the item was built. For the artificial data in which 
two factors were built, two factors were extracted; dnd for 
data with three factors, three factors were extracted. An at- 
tempt was made to label the columns of the solution matrices 
with the letters of the scale factors with which they were most 
closely associated. Thus, if the first column of a varimax so- 
lution appears to represent scale factor a for the given data 
set analyzed, the column is referred to as varimax solution a, 
and similarly for cluster solution a. 
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Image Analysis Alpha Analysis Principal Components 

Analysis 



The hypothesized scale factors of data sets A, B, and C 
are clearly well defined by the normal varimax solution. Hy- 
pothesized factors defined by the pattern matrix of the Inde- 
pendent cluster solution for data sets A and B are clear.,, with 
the exception of the negative loading for variable 1 of data 
set B, which has the highest 'difficulty” index of the variables 
in the set. For data sets C and D, the independent cluster 
solutions are acceptable; but for data set D, the low negative 
ehtries on. cluster a are the result of the variables with high 
difficulty Indices on scale factor b and variables with low dif- 
ficulty indices on scale factor c. 

Residual Correlation Matrix . Horst's procedure produced 
singular matrices for data sets B and C. Because of the sin- 
gular matrix, image analysis was not applicable. For data sets 
O 

d D, Horst's procedure resulted In bipolar factors. In an 
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attempt to ''partial out" the varying difficulties, Horst's pro- 
cess also produced factorially complex varimax and independent 
cluster solutions. 

For data set A, the varimax and independent cluster solu- 
tion for the first Harris factor a can be discussed in terms of 
the difficulty indices of the items of scale factor a and scale 
factor b. The high positive loadings for varimax solution a 
and cluster solution a correspond to the items of scale factor a 
having low difficulty indices, while the high negative loadings 
correspond to the items of scale factor b having high difficulty 
indices. Similarly, the high positive loadings of varimax so- 
lution b and cluster solution b are associated with those items 
of scale factor b having low difficulty indices, and the high 
negative loadings are associated with those items of scale fac- 
tor a having high difficulty indices. 

Both varimax and independent cluster solutions of the Harris 
factors of data set D have bipolarity in each column of the so- 
lution matrices. The high positive loadings of varimax solution 
a and cluster a are associated with scale factor a, and the 
high negative loadings are associated with those items of scale 
factor c having low difficulty indices. For varimax solution b 
and cluster solution b, the high positive loadings are assoc- 
iated with those items of scale factor b having high indices of 
difficulty. The high negative loadings on varimax solution b 
and cluster b are associated with those items of scale factor c 
having high difficulty indices. The high positive loadings cf 

O varimax solution c and cluster solution c are associated with 

FRir 

Mwiafera-n those items of scale factor c having low indices of difficulty 
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while the high negative loadings are associated with those item 
of scale factor b having difficulty indices around .50. 

Horst's procedure produced two singular matrices and ren- 
dered the other two data matrices uninterpretable . There was 
no recovery of factors when using the Horst modification, and 
the procedure appeared to cause bipolarity and splitting of fac 
tors. Data modified by Horst's procedure could not be clearly 
analyzed through the use of image analysis; however, the unad- 
justed data analyzed by image analysis was clearly interpret- 
able, and in all four sets of data, recovery of the artificial 
factors was possible despite widely varying difficulties of 
items . 

ALPHA ANALYSIS 

Bi nary Data Matrix . Alpha factor analysis of the unmodi- 
fied data for data sets A, B, and C with a normal varimax so- 
lution allowed complete recovery of the scale factors. For 
each set of data, the common factor structure was well defined. 

In data set D, the influence of high "difficulty" indices 
is seen by the loadings of variables !1 and for varimax solu- 
tion factor a. Although varimax solution factor a is clearly 
the hypothesized scale factor a, the effects of scale factor c 
are present. 

Alpha analysis with an independent cluster solution, al- 
though acceptable and clearly defining the hypothesized factors 
tended not to have a clear positive manifold for all clusters; 
i.e., cluster a for data set A, cluster b for data set B, clus- 
ters a, b, and c for data set C, and clusters a and c for data 

“ 9 



9 



set D. 

Residual Correlation Matrix . Horst's modification, as 
noted in the image section, produced two singular matrices; con- 
sequently, no alpha analysis was performed on data sets B and C. 

Just as bipolarity occurred in the image analyses, so also 
did it occur in the alpha analyses. Both the varlmax and inde- 
pendent cluster solutions had bipolarity on every factor and 
cluster. 

There was no clear recovery of factors using the Horst 
modification. The procedure caused bipolarity as well as split- 
ting of the factors. Without the Horst process, the data ana- 
lyzed by alpha analysis was interpretable; and all four sets of 
artificial factors were completely recoverable with both a vari- 
max or independent cluster solution. 



PRINCIPAL COMPONENTS 




Binary Data Matrix . For data sets A, B, and C the hypo- 
thesized factors are clearly defined by both varlmax and inde- 
pendent cluster solutions . Both transformations for data set D 
tend to have minor row complexity and lack a clear positive 
manifold on some factors. 

Resid ual Correlation Ma trix , As in the previous analyses, 
Horst's modification produces bipolarity for each factor and 
cluster. Ho factors were clearly recoverable, and each solution 
was factorially complex. Principal components analysis of the 
unmodified binary data matrix yielded factors that were clearly 
the hypothesized factors. 

10 
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SUMMARY 

Horst's modification clearly induces an effect that con- 
taminates the common factor structure of the variables. Analy- 
sis of the item difficulties and the factcra! structure of the 
solutions after Horst's modification seem to indicate that an 
Influence due to item difficulties is Involved in the common 
factor distortion. That is, Horst's procedure for partialing 
out the effects of item difficulty appears to be complicating 
the common factor structure for the data. The item difficulties 
appear to have increased effects on the factoral structure rather 
than decreased effects after the Horst modification. 

On the positive side, the findings indicate that image, 
alpha, and principal components analysis of correlation matrices 
obtained from binary data matrices are all satisfactory methods 
of analysis without Horst's modification. ./This could possibly 
be an important result; it tends to confirm the empirical find- 
ings of Pruzek ( 1967 ) and Dlngman ( 1958 ) that varying difficul- 
ties of binary items do not tend to be of great practical con- 
sequence, at least when derived clusters are relatively clear. 

Research by the authors related to the problem of analysis 
of binary response data indicates that image analysis generally 
provides the most interpretable analyses of such data, while 
alpha analysis tend3 to produce a Heywood case if several first 
order partial correlations of a particular variable are rela- 
tively high. PrlnclpSl components analysis has a tendency to 
produce complex patterns for some data. In view of this, Horst's 
procedure appears not*.«to warrent further study. 
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APPENDIX 





TABLE 1 



Permuted Binary Data Matrix 
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